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ABSTRACT 

This work investigates some problems arising in application of 
(Kalman) linear filter theory to real problems, where practical es- 
timates must replace exact theoretical quantities in problem form- 
ulation. The principle objective is application of linear filter theory 
to random signal даечеспоп/сіафб нен. However, an example of 
classical estimation, error estimation in shipboard inertial navigation 
systems, is offered to illustrate general points discussed. A unified 
treatment of models for random time series is presented, including a 
comparative review of models which have been proposed and pro- 
cedures for obtaining model coefficients. Correlation detection of 
deterministic signals is discussed and the resulting principles ex- 
tended to the case of random signal detection. Application of 
linear filter theory to the problem is indicated. Finally, an ex- 
perimental study in random signal detection/clas sification is 
included. Experimental signals used are hydrophone recordings of 
sea noise and sea noise plus diesel submarine. Consistency of 
successful results obtained suggests practical utility of method in 


certain random signal detection/clas sification problems. 
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CHAPTER I 
INTRODUCTION 

It is frequently an integral requirement in engineering information 
processing systems that measurement data be employed in subsequent 
decision making or action, Invariably, the processing problem lies 
in a stochastic environment because of inaccuracies in the measure- 
ment process or random perturbations of the observed process during 
or between measurements. Some examples are determination of 
satellite orbit parameters from noisy radar or radio-location data , 
target track determination in the radar fire control problem, filtering 
of signals received in the presence of noise, and state estimation in 
linear dynamic systems from noisy transducer information. 

What might be termed "classical" estimation theory in engineer- 
ing has been based upon the pioneering work of Wiener published 
in 1942. This theory rests upon three main assumptions: (i) linear 
operations on received data, (ii) a quadratic loss criterion, and 
(111) stationary statistics describing the signal and noise processes. 
In applications, the objective is to obtain a specification of the 
electrical filter (the Wiener filter) giving optimum estimation, or 
isolation, of the signal from corrupting additive noise. Posed in 
terms of integral equations, the problem results in the Wiener-Hopf 
integral equation, which must be solved to obtain the impulse re- 
sponse of the optimum filter. Broad application of the theory has 
suffered from the difficulty of obtaining solutions for the Wiener- 
Hopf equation. 

In 1960, Kalman posed the estimation problem in terms of 
differential equations, using the concept of "state" from linear 
system theory. Characterization of the problem in terms of 
differential equations offers several advantages. Chief among them 
are suitability of the new approach for machine computation, re- 
laxation of the stationarity assumption, and the intimate relationship 


between the optimum filter and a linear model describing the signal 
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process. The work here is motivated by the contributions of Kalman 
and considers application of his techniques to problems in statistical 
communication theory. 

In the title, the term "real-time" means that computational effort 
for estimation procedures to be employed does not increase with in- 
creased numbers of observations. Further, following the practice of 
recent years, the collective term "estimation" is considered to include 
the traditional problems of smoothing, filtering, and prediction. Since 
digital computation is to be employed in examples, and since the 
required mathematics is less obscure, the present work will deal 
almost exclusively with estimation of discrete signals. For engineer- 
ing purposes, little generality is lost. The description of continuous 
filters can in general be obtained by considering the limiting case of 
the corresponding discrete filter as the sampling interval is allowed 
to approach zero. 

In Chapter II, basic assumptions underlying optimum estimation 
theory developed by Kalman are reviewed, and some practical con- 
siderations are discussed. As an example, the problem of estimating 
latitude error in shipboard inertial navigation systems is outlined, 
and application of the Kalman filter to the problem is described. 

Chapter III considers the important problem of finite parameter 
models for random time series, a subject not heretofore treated in 
a unified manner. First, the Spectral Distribution Function, intro- 
duced for the discrete time case by Wold in 1938, is discussed and 
its three constituent distribution functions noted. Then models which 
have been proposed for time series are considered with regard to their 
capacities for representing the various constituent functions. The 
combined autoregressive-moving average model is shown to be 
equivalent to the linear filter, and the relationship between their 
respective parameters is indicated. A discussion of parameter es- 
timation is then included. Present difficulties in solving for moving 


average model parameters is noted. It is shown that the approach of 
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Ho and Lee [16 ]to the identification problem amounts also to a 
recursive means for estimating autoregressive model coefficients. 
Finally, the recursive approach is employed in an example to obtain 
autoregressive model coefficients for a sample signal. The signal 
consists of a 50 cycle sine wave combined additively with noise from 
a laboratory random noise generator. A plot of z-plane poles of the 
model is given. Also included are comparative graphs of the auto- 
correlation functions and power spectral densities derived from the 
signal and model respectively. 

The detection of random signals in noise is considered in Chapter 
IV. As it happens, optimum detection of Gaussian random signals 
constitutes a simple (at least conceptually) extension to the idea of 
correlation detection of deterministic signals. In the case of Gaussian 
random signals, the optimum receiver forms a cross correlation of the 
received signal and a least squares estimate of the Gaussian signal 
given the received signal. The extension is shown following a brief 
development of correlation detection of deterministic signals. Diffi- 
culties involved in directly evaluating the Likelihood Function for 
each alternative classification are noted. Then employing the approach 
of Schweppe [31 ], it is shown that the Likelihood Function may be 
evaluated recursively. For random signals which are single time 
series, no matrix inversion is required. Quantities required in the 
recursion relations may be obtained from Kalman filter estimation of 
the Gaussian signal given the received signal. On-line computations 
may be greatly reduced by storage of short lists of required quantities 
which do not depend upon the received signal. Using stored lists, 
on-line multiply-add combinations per received signal sample are 
reduced to approximately 2n + 2 for each classification alternative, 
where n is the order of signal models used. 

An experimental problem in random signal detection investigated 
in Chapter V represents the first testing against actual signals of the 


methods developed or discussed in this dissertation. The experimental 
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signals used are hydrophone recordings of (a) a diesel submarine plus 
sea noise and (b) sea noise alone. After obtaining autoregressive 
models for the signal-plus-noise and noise processes, recursive 
Likelihood Function evaluation discussed in Chapter IV is employed to 
test performance of optimum linear classification on actual signals. 
Consistent correct classification is obtained within less than 100 
samples of signal record (representing less than 60 millisecs of signal 
record). Furthermore, classification efficiency is maintained over 
rather wide deviations between actual and assumed received signal 
parameters, Hence initial results suggest practical applicability of 
the method to certain real-time random signal detection/classification 


problems. 
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CHAPTER II 
LINEAR LEAST SQUARES ESTIMATION - 
THE KALMAN FILTER 

Interest in time-domain estimation techniques and applications 
has been stimulated in recent years by the contributions of R. E. 
Kalman [19 ]. Subsequent investigators have related Kalman's 
derivation and results to more familiar estimation techniques, thereby 
enhancing general understanding and utility of the new approach. Per- 
haps the simplest derivation of Kalman's results lies in the Bayesian 
approach employed by Peschon and Weaver [26 ], [35 ]. Another 
step toward better perspective was the demonstration by Lee [21 ] 
and Fagin [10 ] of the equivalence between Kalman's results and 
generalized recursive least squares estimation. A unified and 
general treatment of modern estimation theory by Deutsch [9 ] has 
recently been published. 

The purpose of this chapter is to review the basic assumptions 
underlying Wiener-Kalman filter theory and implications of the 
assumptions. Further, practical considerations arising in appli- 
cations are discussed. In this regard, a comparative discussion of 
the Kalman filter and generalized, recursive estimation of a Gaussian 
population mean is introduced to explicitly show the role of a priori 
information. Finally, a practical problem is considered to illustrate 
an application of the methods treated here. 

I. BASIC ASSUMPTIONS | 

The two basic assumptions underlying optimal filter theory are 
(1) linear operations on the data are to be performed, and (2) the 
criterion for goodness is mean squared error. Under these assump- 
tions, it is well known that only second order statistical moments 
are employed in the estimation scheme [8 ]. An immediate result is 
that for non-Gaussian signal processes, the optimal linear estimate 
obtained is the same as would have been obtained from the corres- 


ponding Gaussian signal process (i.e. possessing the same first 
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two moments). Moreover, the optimal linear least squares estimate 
of a Gaussian process is identical with the Maximum Likelihood 
estimate. The conclusion follows that linear least squares estimation 
may be improved upon only for non-Gaussian processes, and then 
only by an estimation procedure which takes into account third or 
higher statistical moments of the process. Conversely, in dis- 
cussions restricted to linear least squares estimation, assumptions 
of Gaussian process statistics can be made without effect on the 
estimation outcome. 
II, THE SIGNAL PROCESS MODEL 
It is commonly assumed in engineering literature on stochastic 
estimation that the scalar or vector-valued random time series to be 
processed can be regarded as the output of a linear dynamic system 
excited by white noise. Implications of this model will be discussed 
in greater detail in Chapter III. It is also assumed that observations 
of the system output are obscured by additive Gaussian white noise. 
Block diagrams of continuous and discrete descriptions of the model 
are given in Figs. 2-1 and 2-2 respectively. 
The general continuous linear dynamic system model may be 

described by the vector differential equations 

x(t) = F(t)x (t) + G (t)w (t) 

y(t) - H(t)x(t) 

Z(t) = y(t) жай 
where 

x(t) is the n x 1 vector of system states. 

F(t) is ann xn matrix describing the homogeneous system. 

w(t) is a p x1 vector of Gaussian white noise 

G(t) is an n x p matrix which distributes excitation noise 

across the states. 

y(t) is anm x 1 vector of system output. 

H(t) is an m x n observation matrix. 

v(t) is an m. x 1 vector of white Gaussian measurement 


noise 
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. Block Diagram of General Continuous 


Linear Dynamic System 


x 





Block Diagram of General Discretely 


Described Linear Dynamic System 
- Fig. 2-2 
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z(t) is an m x 1 vector of observations. 
Excitation and measurement noise Sources are assumed to be indepen- 
dent. Characteristics of the noise vectors w(t) and v(t) are: 
ELw(t)] = ELv(t)]=0 


ЕГм(Әм(т)11- 19 457 
(DENT T 


Blvitw(n)TI= 15 i2; 


For the corresponding discretely described linear dynamic system, 

the describing vector difference equations are 

x(k) = ®(k)x(k-1) + Г (k)w(k+1) 

у(к) = H(k)x(k) 

z (k) = y(k) + v (k) 
Corresponding to the continuous case, the noise vectors w(k) and 
y(k) are assumed to be samples of a vector-valued Gaussian noise 
process, and are assumed to be independent from sample to sample. 
For w(k), this assumption of uncorrelated samples represents no loss 
of generality, since the excitation noise is fictitious anyway, On 
the other hand, measurement noise encountered in certain applications 
may not be totally uncorrelated from sample to sample, In such cases, 
the correlated portion may be accounted for by augmenting the system 
order as necessary. The form of the resulting augmented system is 
simply arrived at, Actual numbers to describe the measurement error 
correlation may be obtained by considering the points discussed in 
Chapter III. 

III. THE OPTIMAL FILTER 
References cited above include derivations of the Kalman filter 

employing variously geometric (orthogonal projections in Hilbert 
space), Bayesian, and Maximum Likelihood approaches. Addition- 
ally, equivalence of the optimal filter and recursive linear least 
squares prediction has been indicated. It will therefore suffice for 
the work to follow to include here only the equations describing the 


optimal filter and the corresponding block diagram (Fig. 2-3). The 


2). 


x(k-1/k-1) 





Block Diagram of Kalman Filter 
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equations given are for the discrete formulation. 


B(k) 2 P(k/k-1)HT(HP(k/k-1)HÍ + R) "1 

x(k/k) = Фх(к-1/к-1) + B(k)[z (k) - H €x(k-1/k-1)] 
P(k/k) = [I - K(k)H] P(k/k-1) 

P(k+1/k) = &P(k/k) ç$ T +T Q TÍ 


In the equations given, P isannxn covariance matrix of the 
estimate, The double index is used to mean that P(k/k-1) is the 
variance of x(k/k-1), the optimal state estimate at time increment k 
given data only up to tine k-1 (x(k/k-1) = 9$ x(k-1/k-1)). 

While the equations above describe optimal linear estimation of 
the signal process model states, what frequently is required is the 
optimal estimate of some linear combination of the states, for ex- 
ample y = Hx. In order that the equations above be useful in such 
a case, it must be so that the optimal estimate of a linear combin- 
ation of states is the same linear combination of the optimal estimate 
of the states. A proof of this point is included in Appendix I. 

IV, A PRIORI INFORMATION 

As is readily noted, the difference equations above which de- 
scribe behavior of the Kalman filter or recursive least squares es- 
timation require a Starting point. Whatis needed is an initial 
estimate of the states x(1/0), and an initial covariance matrix 
P(1/0), indicating the uncertainty in the initial estimate. These 
quantities represent a priori knowledge about the process to be es- 
timated since they are required to be specified before any obser- 
vations are made. An intuitive feel for the role of x(1/0) and P(1/0) 
may be gained by comparing behavior of the Kalman filter and gen- 
eralized recursive estimation of a Gaussian population mean, out- 
lined in Appendix II. It is shown there that a priori information 
appears to the filter just as additional equivalent observations of the 
process to be estimated. 


It is difficult in many problems to arrive at meaningful values for 
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the a priori quantities required. If the signal process model is stable 
and the excitation noise covariance Q is known, a priori knowledge 
may be obtained directly from knowledge of the model. Motion of states 
of the model is a Gaussian random process, a linear function of the 
Gaussian excitation process. Since the excitation has zero mean, the 
expected value of states of the model given no observations is also 
zero, Thus x(1/0) equals the zero vector. Further, the covariance 
describing uncertainty in state values is the solution matrix S of 


the difference equation 
S = §S §T+TQ FT 


At any stage, the matrix P(k/k-1) is defined as the error co- 


variance matrix at time k given observations to time k-1. Hence 


P(1/0) = EL(x(1) - x(1/0))(x(1) - x(1/0))7} 
- ЕГх(1)х(1)11 
= 6 


Another suggested approach when a priori requirements present 
difficulty is to specify some reasonable value of x(1/0) and let 
P(1/0)2 cI, where c is a large number, It is motivated by the fact 
that weighting of measurement data is in inverse proportion to un- 
certainty in the data relative to uncertainty of the current best estimate 
before data receipt. Thus effects of the initial assumptions may be 
washed out by the incorporation of new data [21 ], [10 ]. 

In the example below, both the above methods are tried and com- 
parative results noted. Further use of the two methods is made in 
the experimental study of Chapter V, There the latter method is used 
in the recursive model determination program, and the former is used 
in the program for computation of Likelihood Functions. 

V. AN EXAMPLE - ESTIMATION OF SYSTEM 
ERROR IN SHIPBOARD INERTIAL NAVIGATION SYSTEMS 
An example to which the techniques described above may be very 


naturally applied is the problem of error estimation in shipboard inertial 
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navigation systems (SINS). Gyroscopes used in the system to provide 
a stable inertial platform are subject to certain random drifts. Hence 
in order to maintain the necessary accuracy of the system over in- 
definite periods at sea, it is necessary to determine system errors 

for use in recalibrating or resetting the SINS. Such an error deter- 
mination necessarily depends upon information from external sources; 
in this case upon LORAN position information and position/heading 
information from star sightings, etc. However, information from these 
sources is imperfect due to measurement inaccuracies. Additionally, 
it is desired to have to resort to LORAN or other observations as 
infrequently as possible, so optimal use of observational data is 
indicated. These conditions, optimal estimation from noisy obser- 
vations, comprise justification for considering the problem here. 

The problem will be pursued here only far enough to illustrate 
application of the method. Thus only the problem of estimating 
latitude error will be discussed in detail. Without further theo- 
retical complication, the treatment may be expanded to encompass 
remaining aspects of the error estimation problem. Optimal control 
techniques may then be employed in system reset, a step under 
current development [5 ]. 

1. Background 

As stated above, SINS employs gyroscopes to stabilize a gimbal- 

supported inertial platform. Remarks to follow will relate to the 


four-gimbal system! 


, the innermost two gimbals of which are termed 
the "latitude gimbal" and "heading gimbal." Upon the innermost 
gimbal, the latitude gimbal, are mounted three stabilizing single- 


degree-of-freedom gyros with mutually perpendicular input axes. 


lThe present discussion will be only detailed enough to outline 
the estimation problem to be considered. More detailed discussion 
of inertial navigation systems may be found in [24 ], [32 ]. 
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This basic structure will be discussed further below. 

Two coordinate systems fixed in the gimbals describe operation 
of the system. They have been termed the "instrumented equatorial 
coordinate system", fixed in the latitude gimbal with axes E', Y', 
and P', and the “instrumented local coordinate system", fixed in 
the heading gimbal with axes x', Y', and z'. These axes correspond 
to "earth axes" E, Y, P, x, and z shown in Fig. 2-4. The latitude and 
heading gimbals are pivoted to each other on the Y' axis. 

Identification of axes shown in Fig. 2-4 follows: 

E is parallel to intersection of local meridian plane with 
plane of equator, positive outward. 
Y is in local horizontal plane and extends eastward. 
ie P is parallel to polar axis of earth, positive toward 
North pole. 
x is in local horizontal plane and extends northward. 
Z is parallel to local vertical, positive downward. 

Having introduced the coordinate systems employed in SINS, 
orientation of the stabilizing gyros may be more completely described. 
As stated, they are mounted on the innermost, or latitude gimbal, 
in which is fixed the instrumented equatorial coordinate system, The 
gyros are termed the equatorial, latitude, and polar gyros, and are 
mounted with their input axes parallel to the E', Y', and P' axes 
respectively. Drift rates of the three gyros are denoted, respectively, 
Бу ер, єт, апа єр. 

In order that the instrumented equatorial coordinate system 
correspond accurately with the earth's equatorial coordinate system, 


the latitude gimbal is torqued about the P' axis at the rate Q+ à where 


Q= Earth rate 
À = Ship longitude rate obtained via an accelerometer on 
the inertial platform. 
Additionally, the heading gimbal is torqued with respect to the latitude 
gimbalata rate (-L') equal to the negative of the SINS indicated 
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Equatorial and Local Coordinate Systems 


Fig. 2-4 





Coordinate System Misalignment 


Fig. 2-5 
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latitude rate (also obtained via an accelerometer on the inertial 
platform). In operation, the system indicated latitude is given by the 
angle between the latitude and heading gyros, and indicated longitude 
is given by the angle between the latitude gimbal and a reference. 

This completes a brief description of system operation. 
2. Error Generation 

For the present discussion, operation is assumed to be in "damped 
inertial" mode, reducing Schuler oscillation errors to comparatively 
insignificant dimensions. Nevertheless, it remains necessary to 
estimate errors resulting from imperfect system resets and the gyro 


drifts € and ер mentioned above. 


E’ C 
When ad system is misaligned, the instrumented and earth equa- 
torial coordinate systems do not coincide as desired. Misalignment 
consists of small angular displacements фу about the Y axis and 
Фр about the E axis. The situation is depicted in Fig. 2-5, where 
the vector P' has unit length. 
In the absence of gyro drift, the P' axis is motionless relative to 
inertial space, while the earth coordinate system rotates about P 
at the earth rate. The projection of P' in the equatorial plane thus 


describes a circle centered at the origin and repeated every 24 hours. 


Differential equations describing the circle are 


e 
Фү, -Q ФЕ 

& 

Фр = QO, 

Introducing gyro drift, ¢ ү, acquires an additional rate of change 
equal to е, the latitude gyro drift. Similarly, ór. acquires an addition- 
al rate of change equal to ер caused by equatorial gyro drift, The 
equations become 


e 


L = -QAE + єт, 
др = ОФт + EE 


If it is assumed that the drift rates єт апа €p are constant, the 
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equations may be written 


а= ер SL 
dt Ü ór Ua) - ACE - s 


d € 
"dt (Фр- d-, = 0(ф + "2d 

The locus is still a circle but is centered off the origin as shown in 
Fig. 2-6. 
3, Error Estimation 

Importance of the error locus lies in its relation to physically 
measurable errors, Under the common assumption of perfect knowledge 
of the local vertical (i.e. z' and z vertical), it has been shown 


[32 ] that 


do = CL 


Фр € HcosL 
where 
CL- Latitude error 


CH 


Heading error 

In general, only position information is conveniently and regularly 
available. Hence only 2. сап Бе directly estimated, By use of 
three or more latitude error observations, the circle may be determined 


to within the Y axis displacement SE . 
2 


Current operating procedure involves a manual graphical fit of 
a 24 hour period sine wave through plotted succeeding values of 
latitude error versus time, The method is patently both time con- 
suming and vulnerable to human error. In the discussion below, 
this same reduced problem will be treated with a recursive least 
squares procedure (Kalman filter), While, as stated, no further 
theoretical complication is involved if treatment is extended to the 
complete problem, all essential features needed to show the 
application are present in the reduced problem unobscured by 


arithmetic. 
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Error Locus with Constant Gyro Drifts 
Fig. 2-6 
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Linear Model for Latitude Error 
Fig. 2-7 
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4. Recursive Least Squares Error Estimation 


In order to employ the Kalman filter, it is necessary to specify a 
model for the observed process in terms of a linear dynamic system 
excited by white noise. Furthermore, it is necessary to possess a 
priori information about the amount of noise contaminating obser- 
vations and about the mean squared value of the observed process. 

Since only estimation of latitude error (which equals ер is to 
be considered in the reduced problem, a simple form for the model 
may be written by inspection. It appears in block diagram form in 
Fig. 2-7. The error locus of Fig. 2-6 may be considered to be pro- 
jected upon the E axis. Hence error estimation in the reduced 
problem amounts to estimation of the unknown amplitude, phase, 


and offset (- SE ), The singular model of Fig. 2-7 has a null G 
Q 


matrix, and its behavior is thus determined solely by the unknown 
initial conditions. In Fig. 2-7, the pure integrator leg preserves the 
initial offsét;- and the sécond order legrrepresents the 24 hour period 
sine wave of unknown amplitude and phase, Using this model, a 

priori quantities used in the filter are x(1/0) » 0 and P(1/0) = 10,0001. 
The model of Fig. 2-7 was converted to discrete form suitable for 

digital computation. The resulting matrices describing it are given 


below (for T » 1 hour). 


1 0 0 1 
Г = (0 Ф =|0 JOY IO0 FI — i 
EL —-.057299 99659 0 


In order to obtain a priori quantities in the alternate manner 
noted above , the model of Fig. 2-7 may be made absolutely stable 
by the addition of damping in each leg. The precise 24 hour period 
of oscillation and essentially constant offset (constant drift rate) 
suggests that only slight damping be used, Thena value of q , 
variance of the white excitation noise, is specified such as to 


obtain steady state peak oscillation of about 5-10 minutes of arc, 
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a rough mean peak oscillation observed in practice. 

In the simulation, latitude measurements were assumed available 
every three hours, and were all assumed to be of the same quality. 
Assumed standard deviation of measurement noise was five. It was 
assumed that an updated latitude error estimate, with its corres- 
ponding variance, was desired hourly. Hence updating without data 
is done in the intervening hours between receipt of measurements. 

Filter computations are straightforward, and need not be elab- 
orated upon. Results of the simulation are displayed in Figs. 2-8, 
2-9 for the unexcited model, and Figs. 2-10, 2-11 for the stable 
model. Comparison of the resulting estimates shows that both 
models achieved approximately the same estimation accuracy. The 


simulation program listing is included in Appendix III. 
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Fig. 2-9 
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Observed and Estimated Latitude Error - Excited Model 
Fig. 2-10 
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CHAPTER III 
MODELS FOR RANDOM TIME SERIES 
I. GENERAL 

The purpose of this chapter is to treat some valid questions which 
arise when application of Kalman estimation techniques is considered 
for problems in random signal detection and pattern recognition. In 
such problems, the nature of possible random time series to be pro- 
cessed becomes quite general, and some uncertainties faced are: 

(1). Is it still a justifiable assumption that the time series 
to be processed is the realization of a Gaussian process? 

Gi): Can it be expected that an acceptable finite-parameter 
representation of the signal process may be found? 

(iii). What finite-parameter schemes have been considered in 
the literature on time series analysis ?- 

(iv). What schemes are available for obtaining the various 
finite-parameter representations ? 

In considering further the questions above, it will be helpful 
to review pertinent work done in the field of time series analysis. 
In recent years, works of Tukey [4 ], Grenander and Rosenblatt 
[11, 12 ], Parzen (25 |, and Wold [39 J are representative and 
contain references to additional literature on the subject. The central 
problem in time aeries analysis is the following: one is allowed to 
observe a time series, a partial realization of a stochastic process, 
and on the basis of this observation is required to make statistical 
inference about the time series. In general, the required inference 
concerns the possible mechanism generating the time series or pre- 
diction of future behavior of the series. 

Work done to the present in time series analysis has primarily 


dealt with single time series! , resulting from stationary (and ergodic) 


lit is only the single time series, i.e. the single-output signal 


generating process, which it is desired to consider here, Certain 
work has been done in the analysis and model determination of 
multivariate time series, and may be found in [3 ], [40 ], [30]. 
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random processes with finite first and second moments. In fact, the 
great majority of effort in the analysis of random time series has been 
directed toward spectral analysis or correlation analysis, employing 
only the first and second moments of the generating process, This 
has come about for two reasons. First, it is possible to accomplish 
a great deal with only second order quantities, and second, the use 
of higher order moments introduces as yet intractable difficulties 
[13]. Thus the theory which has been developed has been tailored 
to fit the methods used. Of course in applying the theory, it is 
necessary to keep in mind that there are time series about whose 
statistical behavior only limited information is conveyed by the 
correlation or spectrum, Two time series may have the same auto- 
correlation function or power spectral density, yet possess quite 
different statistical properties. A classic example is provided by 
the so-called random telegraph signal and the output of a proper 
low-pass filter excited by Gaussian white noise. 

From the above discussion and the fact that the Kalman tech- 
nique employs linear processing, the answer to (i) is clear. As 
long as the processing scheme to be used employs only the first 
two moments of the time series to be analyzed, then without loss of 
generality, the subject time series may be replaced by any time series 
having the same autocorrelation function [6 ]. In particular, it may 
be replaced by a Gaussian time series, a sample function of a 
Gaussian random process, which is completely described by its 
first and second moments. 

Remaining sections of this chapter will deal with the questions 
of finite parameter representations of time series and methods of 
estimating the parameters. 

II, FUNDAMENTAL MODELS FOR RANDOM TIME SERIES 

As stated above, the principal matter of concern in time series 

analysis is the making of inferences about the structure of a random 


process {y(t) } based on observation of a partial realization of the 


37 


process, a time series y(1), y(2), . . . y(N). In general, analysis 
requires a mathematical model of the time series. For practical reasons, 
it is highly desirable that the model be of finite-parameter form, and 
moreover, that it be of as low an order as possible. 

Since attention here is being restricted to linear processing 
schemes, for which as seen above, the random series may be adequate- 
ly represented by its first and second order moments, use in analysis 
of either the power spectrum or autocorrelation function is equally 
valid. For investigating modeling potentialities, it turns out to be 
more convenient to deal with the power spectrum than the covariance 
sequence.! 

In 1938, Wold [38 ] showéd! that for the discrete time сазе, 
there exists a non-decreasing bounded function 8 (а), defined for 
-T su <s T , such that 

fT 
E T- 0,2 03 
-T 
The function 8 (w) is called the Spectral Distribution Function of the 
time series. This theorem by Wold corresponds to that of Khintchine 
in 1933 for the continuous case, where he showed that there exists 
a non-decreasing bounded function &3(w) , defined for --s ws ©, 


such that 


oo 


Е т 


= © 
(w) can be uniquely written [5 ] as the sum 


3 (w) =3ac(w) + 56 о) + Bsc(w) 


where the three distribution functions have the following properties. 


l The work here will deal with discrete-time random time series. 


Though the supporting theory has been developed for the continuous 
time case, it is considerably more complicated [3 ] than that for 
discrete-time sequences. Besides, for engineering purposes, useful 
results in the continuous case may often be had by taking discrete- 
time results to the limit. 
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Sac(w) is absolutely continuous and is the integral of a non-negative 
function F(w) called the Spectral Density Function or Power Spectral 
Density of the time series. The function &Sd(u) is a discrete func- 
tion which represents the contribution of power at discrete frequencies 
to the Spectral Distribution Function of the time series. It may be rep- 


resented as 
Sd(w) = LA 3 (wi) 


where 


A lwi) = 3(witd) - B(wi - 0) 


The remaining constituent function &3sc(w) is a singular continuous 
function, constant except ona set of Lebesgue measure Zero [11 1. 
It has been asserted [13 ] that this part does not appear to be mean- 
ingful observationally, and it is commonly neglected in the literature. 

Having noted the elemental parts of the general Spectral Dis- 
tribution Function ð (w), it will be constructive to examine models 
which have been proposed for time series and to consider their 
capacities for representing the various constituent functions. 
Method of Hidden Periodicities 

One of the earliest models in the history of time series analysis, 
the so-called scheme of hidden periodicities was introduced in 1898 
by Schuster in connection with meteorological studies. The model 
assumes that the observed random sequence {y(k) | тау Бе 
represented as the sum of a mean value function m(k) and white 


noise v(k) 
y(k) = m(k) + v (k) 


where the mean value function m(k) represents a systematic 


oscillation 


m 
m(k) = Z A, 
i=1 


, cos(w,k + ġ;). 


In the mode1, the amplitudes А, , angular frequencies Ws and 
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phase angles Ф; аге constants, some of which are known and some 
to be estimated. The samples of white noise v(k) have the properties: 
o? Y; 


Байы Elv(k)v(j)] = 1 E 


Obviously, the spectral distribution function of any signal gen- 
erated by such a harmonic model would correspond only to the dis- 
crete spectral distribution function above. But time series observed 
in nature generally possess at least a mixed spectrum [11 ] , whose 
spectral distribution function contains both the discrete and abso- 
lutely continuous components. Furthermore, to a good approximation, 
many physically observed time series may be considered to have 
Spectra corresponding only to the absolutely continuous spectral 
distribution function Sac(w). Hence, as might now be predicted 
by the theory (still nonexistent at the time the scheme of hidden 
periodicities was introduced), such a harmonic model did not perform 
satisfactorily for many time series analysis problems [39 ]. 

Linear Models 

Of greater practical interest are the various linear models which 
have been proposed. The spectra of such models correspond to ab- 
solutely continuous spectral distribution functions. This class of 
models includes the Autoregressive and Moving Average schemes 
introduced in the late 1920's by Yule and Slutzky, and the linear 
filter excited by white Gaussian noise used by Rice [29 ]. Models 
of the class are defined below, and techniques for estimating their 
parameters are considered in the next section. 
AmioheSnessrvesbrocess 

The current value of the time series is represented as the sum 
of a linear combination of the past n values of the time series and 
an uncorrelated disturbance w(k) , where n is the order of the 
process, 

In difference equation form, the autoregressive process is 


y(k) = by y(k-1) + boy(k-2) * . . . byy(k-n) * w(k) 
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where the disturbance w(k) has the properties, 


2 nés 
E[w(k)]= 0 ЕГм(юм()1-197 853 
0 кя) 
A necessary and sufficient condition that the model be convergent is 


that all roots z, of the polynomial 


1 


lie within the unit circle. 
Process of Moving Averages 

For a model of order m, the current value of the time series is 
represented as a linear combination of the past m values of the 


uncorrelated disturbance, In equation form, 


y(k)2a. w(k)*a ,w(k-1)*... a w(k-m) 
0 l m 


where w(k) is defined as for the autoregressive model above. 
Linear Filter Model 
In discrete-time formulation, this model takes the form 
x(k) = Фх(к-1) + Г w(k) 
y(k) = Hx(k) 
where 
X(k) = Vector of states at time k of the linear dynamic system 
comprising the filter. 
Ф = State transition matrix of the linear dynamic system. 
IT = Vector which distributes the excitation noise across states 
of the filter. 
H = Observation vector, which relates the scalar observable 
to states of the filter. 
y(k) = Value of the random time series at time k. 
w(k) » White Gaussian excitation noise as defined above. 
Models a and b above, on the one hand, and model c on the 
other, have been employed by different groups of researchers for 


seemingly very different problems. Models a and b, though initially 
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proposed for studies of sunspot activity, have in recent years been 
most extensively employed in econometrics research and have to a 
degree come to be associated with that field of study. Conversely, 
the linear filter model has been studied almost exclusively by the 
engineering community concerned with random signal representation 
and stochastic control problems. It is important, however, to note 
their basic similarities. 

The picture may be cleared by noting the capacity of each model 
to approximate the power spectral density of a time series to be 
modeled. As stated above, the spectra of linear models correspond 
to dac(u), the absolutely continuous spectral distribution function. 
The related power spectral density F(w) may be approximated ar- 
bitrarily closely by a rational function in ш? of finite order [12 ]. 
Now in the z domain, for a spectral density which is approximately 
zero beyond the appropriate Nyquist frequency, there is a corres- 


ponding spectral density function rational in z, so that 


(Ze BZ) 
Q(z) 
where P and Q are polynomials. 
Since the present discussion concerns discrete-time analysis, 
it is the relation between the linear models above and A(z) which 
is of interest. Also, since the time series to be dealt with are real, 


A(z) may be written as follows for z on the unit circle: 


1 
Q (z) B(z)B( 1) 


Z 
where 
=a 2F zn-l 4, . ,a 
0 1 m 
a n n=l 
ob ZS +h Zz +... eD 
Biz) 0 1 n 


Whittle [36 ] has shown that a rational spectral function where À has 


order m and B has order n always corresponds to a process structure 
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b.x(k) * b.x(k-1) *... b x(k-n) 
0 1 n 
=a w(k) +a w(k-1) +. . . a w(k-m) 
0 1 m 


Such a model structure has been termed a mixed autoregressive- 
moving average model, the obvious title. 

From the discussion above, relationships between the linear 
models mentioned and the power spectral density is clear. The linear 
filter model and mixed autoregressive-moving average model are 
equivalent and provide a rational approximation to the power spectral 


th 


density. A moving average model of order m produces an m”™ order 


polynomial approximation to the spectral density function, and an 


th order autoregressive model produces an nth order inverse poly- 


n 
nomial approximation to the spectral density function. 

This section has discussed questions (ii) and (iii) raised in 
section I. A remaining question of key practical importance is 
parameter estimation for the model selected. It will be considered 
in brief detail next. 

III, ESTIMATION OF LINEAR MODEL PARAMETERS 

It has been suggested above that linear models have valuable 
practical utility for representing time series. Final employment in 
real applications, however, rests upon the ability to estimate the 
necessary model coefficients. The problem differs quite significantly 
from the problem of plant identification in control system engineering. 
In the present case, inputs to the filter are known only in statistical 
terms, and as a matter of fact, are not physical quantities at all, 
but are fictitious. 

In the section above, the linear filter and mixed autoregressive- 
moving average models were asserted to be equivalent. It is there- 
fore helpful to express the linear filter in such a form that corres- 
pondence between parameters of the two models may be easily shown. 


The "standard canonical form" of [21 ] is such a form. It is obtained 
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through a linear transformation of the state variables, and involves 
no alteration of the filter input-output relation, The observability 
requirement for such a transformation is no restriction in the present 
situation. In the "standard canonical form", the three ' matrices de- 


scribing the filter input-output relation have the form: 


У] | 1 
| ES 
Y2 | 
| 
А | 
n Ф = | о! І Н = 
е | 
| 0 
à eL 
Z3 -b, ә ° Ф Ф 7b, -7bi 


After normalizing coefficients of the mixed autoregressive- 
moving average model to obtain b. equal unity, remaining auto- 
regressive coefficients b. directly equate to elements of the Ф 


matrix bottom row as shown. Equation of moving average coefficients 


а, with filter parameters is not as direct and is given below. 
ap 1 | У 
0 
: a қ (m=n-1) 
Sb po 530 c bi 1 Ул 


The relation above may be obtained either by generalizing from low 
dimensional systems or deriving directly for the general case [14 ]. 
No general techniques presently exist for estimating the para- 
meters а, апа D of a linear filter in a straightforward manner from 
observed data. However, if the time series can be considered to be 
sufficiently represented by a moving average model, then the 


following relations may be solved to obtain the coefficients of the 
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polynomial A(z). 


адар... a fag r (0) 
0 a aa 1| |81 r(1) 
0 Š е : 
° а? $ ° 
0 0а, llan r(m) 


where r(T) are values of the observed autocorrelation function of 
lag T. As it happens, there are 2"! possible ways in which zeros 


of the product А(2)А(1) may be assigned to À to obtain the same 
2. 


power spectral density (and autocorrelation function). Hence there is 
an indeterminacy of order 2™ in the resulting coefficients of A(z). 
This indeterminacy may be resolved by using the fact that the time 
series is real and by forcing all zeros of A(z) to lie on or within the 
unit circle. The net effort, however, has been sufficient to dis- 
courage use of moving average models of very high order. Itisa 
useful observation to note that the autocorrelation function ofa 
series generated by a moving average model is zero for lags greater 
than m, 

Estimation of parameters for an autoregressive model is more 
straightforward. The problem is to obtain a least squares fit for 


the coefficients b, in the relation 


“5 


b y (k-i) = w(k) 


i=0 


subject to the restriction that the roots of B(z) should fall within 
the unit circle. The results of such an approach are the consistent 
set of linear equations [13 ] (bo = 1), where the r(rT) are values of 


the observed correlation function of lag T. 
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г(0) г(1] ----  rín-1) р Г) 
E» wu b | 
| 4 2 | 
| г(0) г(1) Е. | 
| r(n-1) ----- — r(1)r(0) IP ] xo d 


An alternate approach, a recursive approach, is also possible for 
estimating autoregressive model parameters. Least squares estimation 
problems with uncorrelated residuals can be conveniently handled 
within the framework of the Kalman filter [10 ]. Such an approach 
has been employed by Ho and Lee [16 ] to estimate the coefficients 


of B(z). From the autoregressive model definition, 


y(k) = фуу(К-1) + фу(К-2)+.... фу(к-п) +у(К) 
where 

Фу = -b,, 05-7 -b, etc 
Let 

977 (646. а 

у, = (у(к-1) у(б-2).... у(к-п)) 
Then 


For the corresponding Kalman filter, the required quantities are: 


P "O rl =o x=¢ 
Ф =1 H=h=y, Re=r=Elv(k)7] 


Ata glance, itis clear that the computations required are simple. 
Since the filter output is a scalar, no matrix inversion is required. 
Further, gain computations are simplified here since P(k/k-1) = 
P(k-1/k-1). It may first appear that ignorance of r will present 


difficulty. However, normalizing the weighting computations with 
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respect to r and employing the suboptimal but practically useful 
device [21 ] of a very large P(1/0) avoids the problem. 

The recursive approach is quite flexible. If it is desired to 
estimate the parameters of the best autoregressive model for the 
observed time series, each sample in turn is processed. On the other 
hand, if it is reyuired to estimate the denominator coefficients of a 
linear filter having a numerator of order m, then processing..only 
every m+] samples guarantees uncorrelated residuals (recalling 
that correlation in a moving average model is nonzero only over m 
lags). Then the numerator coefficients may be obtained by pro- 
cessing the residuals left after treating the observations with the 
denominator in an autoregressive fashion. 

The approach of Ho and Lee has been used in experiments to 
estimate the model coefficients for an autoregressive representation 
of a random signal. Results are given in the next section. 

IV, EXAMPLES OF MODEL DETERMINATION 

Various facets of the discussion above may be made clearer by 
an illustrative example. Recursive parameter estimation was employed 
to obtain an autoregressive model for a contrived laboratory signal. 
The model to be found was arbitrarily chosen to be of sixth order. 

The contrived signal consisted of the additive combination of 
a 50 cycle sine wave and the output of a laboratory random noise 
generator. Originally generated in analog form, the signal was 
sampled by an analog-to-digital converter under control of a CDC 
160 computer, and the digital samples were stored on magnetic 
tape in 4000-sample blocks E Since initial analysis disclosed little 
noise energy above approximately 400cps, the sampling interval was 
effectively increased from the original 400 ps to 1200 ws by using 
only every third sample, giving a Nyquist frequency of 416 cps. Of 


course, the resulting sample size used in obtaining the model was 


lThe very useful programs used to accomplish the digitizing/ 
recording operation and necessary subsequent unpacking are due to 


N. Barrett [2 ]. 
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reduced to a third of the original 4000 samples. 

Two runs were made, using signal-to-noise ratios of approx- 
imately Odb and -10db. Results are displayed in Figs. 3-1 to 
3-6. Plots of z-plane poles of the resulting models are given in 
Figs. 3-1 and 3-4, The remaining figures for each run contain com- 
parative plots of autocorrelation functions and power spectral den- 
sities derived from the signal and model. A visual indication of 
model efficiency is provided by the comparative power spectral 


density plot. 
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Model Poles - 50 cps Sine Wave in Random Noise - 
0 db Signal/Noise Ratio 
Fig. 3-1 
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Autocorrelation Functions Computed from Data and Model 
S/N = 0 db 
Fig. 3-2 
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Power Spectral Densities Computed from Data and Model 
S/N = 0 db 
Fig. 3-3 
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Fig. 3-4 Model Poles - 50 cps Sine Wave in Random Noise - 
-10 db Signal/Noise Ratio 
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Fig. 3-5 Autocorrelation Functions Computed from Data and 
. Model - S/N = -10 db 
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Fig. 3-6 Power Spectral Densities Computed from Data 
and Model - S/N = -10 db 
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CHAPTER IV 
DETECTION OF RANDOM SIGNALS IN NOISE 

The problems of detecting both deterministicland random signals 
in the presence of additive Gaussian noise have received considerable 
attention in the literature [14 ], [22 ], [23], [34], [37 ]. Itis 
well known that for the former case, the optimum receiver is a cross 
correlatom or matched filter, which correlates the input with a replica 
of the transmitted signal. 

Random signals considered previously, and to be considered here, 
have been assumed to be of the somewhat restrictive but tractable 
Gaussian form with zero mean and knowmautocorrelation. (This 
assumption will be discussed in greater detail below.) Kailath [18 ] 
has shown that the optimum receiver for such Gaussian signals in 
Gaussian noise forms a cross correlation of the input and a least 
squares estimate of the signal based on tnput received in the inter- 
val under scrutiny. 

The results obtained in [18 ] suffer practical drawbacks, however. 
Critical computational difficulties are faced in obtaining the least 
squares estimate when the number of samples considered is large, 
as it is required to invert a matrix of the same dimension as the 
number of samples. Moreover, statistical descriptions of possible 
signals, which were assumed known, are in practice not easy to 
obtain. | 

The present work approaches this problem in a manner somewhat 
similar to that of Weaver [35 ], and employs the results of Schweppe 
[31]. The serious computational difficulties noted above are 
avoided. Specifically, estimation techniques developed in recent 
years by Kalman and others [19 ], (201, 126 ], (27 ] are employed. 


Furthermore, in view of the similarities between correlation signal 


lin the discussion to follow, a deterministic signal will be 
defined as one whose form is completely known at the receiver, and 
a random signal will be one whose description is known at the 
receiver only in statistical terms. 
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detection and linear discrimination methods in pattern recognition, it 
is felt that the approach to be followed here may constitute a new and 
more powerful approach to certain pattern recognition problems. 

For simpler and clearer presentation of results, the analysis to 
follow will treat the discrete, or sampled, situation. However, the 
techniques to be employed are equally applicable to continuous pro- 
cessing. 

I, OUTLINE OF THE PROBLEM 

The problem to be considered is the following: a signal con- 
sisting of a finite number N of samples of a Gaussian process 
y(t) is received during a limited time interval T in the presence of 
additive Gaussian noise. The additive noise and signal process are 
assumed independent. In practice, real time processing is necessary, 
and it is required to announce at the end of the interval whether or 
not the signal y(t) has occurred. It will be seen later that the present 
limitation to a single signal is not restrictive, The same approach 
applies to the problem of detecting and discriminating members of a 
finite set í v”) of signals. 

A word about the assumption of Gaussian signals is in order. 
Aside from transmitted signals whose forms are simply not known at 
the receiver except in terms of their first and second order statistics, 
it happens that the scatter-multipath structure of ionospheric prop- 
agation perturbs transmissions into Gaussian signals [27 ]. The 
same situation is produced in radar and sonar by clutter and rever- 
beration resulting from the effects of a multitude of small random 
scallerers, 

II, CORRELATION DETECTION OF DETERMINISTIC SIGNALS 

It will be convenient to first consider the problem of correlation 
detection of deterministic signals in order to establish notation 
conventions, more clearly outline the problem, etc. 

From the problem statement above, the task at hand is clear. 


The received signal, a mixture of the transmitted signal and additive 
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white noise, must be processed to yield an indication of which trans- 
mitted signal ina finite set of possible signals {y™(t)} was received. 
It is, of course, desired that the indication be as reliable as possible 
under the circumstances. 

The essential feature of the problem is extraction of information 
about the transmitted signal from the mixture of transmitted signal and 
additive noise. Information concerning the transmitted signal is 
clearly more germane to the issue than is determination of signal-to- 
noise ratios [37 ], though the information sought and signal-to-noise 
ratio may often be monotonically related. Possessing as much infor- 
mation about the transmitted signal as it is possible to isolate from 
the received signal available for processing, one may proceed with 
the required decisions. The decision making will commonly also con- 
sider other factors, such as relative penalties for incorrect decisions, 
any a priori information, etc. 

As mentioned, only a mixture, here called z(t), of the trans- 
mitted signal y(t) and noise v(t) is available for processing, i.e. 
z(t) = yl(t) + v(t). The noise v(t) is assumed to be white Gaussian 
noise with mean zero and known variance r. Since the discrete case 
is being considered here, what is available for processing is a se- 
quence of samples z(k) of z(t), where z(k) = yl(k) * v(k). 

At this point, it will simplify notation if we define vectors уі 
and z, where 

уі -уі(1), у(2).....у(м) 
2-20), 20)..... 240) 

From the problem statement, it can be seen that what must be 
performed is a multiple-alternative hypothesis test [22 ]. Stated 
another way, as viewed by the receiver, y is a discrete random 
variable having the possible values yl i=l, m. All information 
at the receiver about the transmitted signal y is contained in the con- 
ditional probabilities of y given the observations z, p(yl/z) [37 ]. 


It is these conditional probabilities which will be considered further. 


og 


Remaining aspects of the decision-making problem (relative costs 
of errors, etc.) depend upon the specific situation. 

The conditional probabilities p (vi/z) may be stated in terms of 
given or easily determined quantities through use of Bayes Formula 
Ll 

p(yl/z) » PG/X )p(y) 
p(z) 
On the right hand side, p(yl) is the a priori probability of signal 
yl (assumed given). p(z) is not a function of the transmitted signal 
yl and may be considered a normalizing factor. The remaining quantity, 
p (z/y) is a function both of the transmitted and received signals, and 
must be considered in detail. 


Expanding the notation, 


p(zZy1) = p[z(1), z(2) . . ..2(N)/y10),v$Q) Уа 


Due to whiteness of the additive noise, consecutive samples of 


n(t) are uncorrelated. Hence 


z(1)/yt (1), yi (2)... yh (N) = 2(1)/y2 (1) — N[y (1), r] 


z(2)/y3 Q0), y3 2) . . wy P(N) = 2(2)/yi (2) ~ N{yi(2), 1] 


C Le, 
Furthermore, z ()/ yl (j) and z (k)/y! (k) are independent random 


variables for all j Zk. Therefore, 
Б) 2/2): z N vi. (2) ns 


= TT P[zG)/yiG)J 
j=l 


In more compact form, 


! -N N : 
р(2/уі) = (2 тг) 7 ехр-1 z [20 - yo)? 
2r j=l 
N N | N | 
- C exp -l- [> 20)2 + z yl? -2r zio) 
er "jel ј=1 m 
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Inside the brackets, the first term represents received signal energy, 
a factor which will cancel in later comparisons of conditional prob- 
abilities. It can therefore be incorporated into C, The second term 
represents energy of the transmitted signal yl. If all the possible 
transmitted signals have been normalized so as to possess the same 
energy, this term can also be incorporated into C. Otherwise, it 
may be carried along as an additional constant Cl, The situation 
becomes, 

z G)y! G) 


p(z/yi) = CCi exp A 
r j=] 


ІШ - 


J 


It is seen that the optimum receiver, which furnishes the con- 
ditional probabilities p(yl/z) to the decision making processor, 
forms a cross correlation of the received signal and the transmitted 
signal and uses the result in computing the desired conditional 
probability. The resulting optimum receiver structure is shown in 
Fig. 4-1. 

The subject of correlation detection is far from exhausted. In 
particular, the decision philosophy to be employed and details there- 
of have been omitted. The purpose of this section has been to es- 
tablish a frame of reference to be extended below in discussion of 
random signal detection. Further discussion of the problem of 
correlation detection, as well as references to the literature on this 
subject, may be found in (22 ], (371. 

While the discussion above has followed the decision-theo- 
retically optimum Bayesian approach to signal detection, situations 
exist where no meaningful values for the required a priori prob- 
abilities can be obtained. It is customary in such situations to 
employ the philosophy of Maximum Likelihood suggested by R. A. 
Fisher. Evaluation of the Likelihood Function, defined as the same 
conditional probability density function p(z/yl) considered above, 
is required. The resulting quantities are weighted by relative costs 


of misclassification and compared, yielding a decision that that.signal 
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Fig. 4-1 .Optimum Receiver for Deterministic Signals 
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yi is present which corresponds to the maximum Likelihood Function 
[8 ]. The essential point is that the Likelihood Function must be 
evaluated in either approach, Hence one may approach the task with 
relatively more confidence that efforts are well directed. 

III, RANDOM SIGNAL DETECTION 

In this section, the ideas developed above will be extended to 
handle detection of Gaussian signals in white Gaussian noise. First 
the approach of Kailath [18 ] will be discussed and its computational 
difficulties noted. Then application of recently developed filtering 
and smoothing techniques to this problem will be considered. 

Here the signal is assumed to comprise a sequence of N samples 
from a Gaussian process whose mean value function and covariance 
function are known. Hence, each signal vector yl possesses a 
multivariate normal distribution with mean vector u= 0 and co- 
variance matrix K1, 


x 


KÎ -р(уіуТ7)-(ві) i,j=1,2....N 
у 1) 


Just as in detection of deterministic signals above, the problem ге- 
duces to evaluation of the Likelihood Function p(z/yl). Since the 
signal and additive noise are assumed statistically independent, the 


received signal covariance is 


Ki = xl +R where R-rI 
2 У 


The Likelihood Function is 


р(2/уі) = 2 т) 5 кі we exp - 1 [21 (кі ) 12] 
2, 2 2. 


It may be expanded using a matrix inversion lemma arising from the 
Frobenius-Schur Relation [7 ]. (Since only a single signal is being 
considered for the moment, notation will be simplified by omitting 


the superscript temporarily.) 


К = лон MUERE NE 
2. Y 
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Tiyu, 
p(z/y) = C exp = [208712 - zl [md (mls e eT); | 


The first term in the exponent does not involve the signal, hence will 


cancel in the decision process, and may be carried as a constant C, 


leaving 
р(2/у) - CC! exp 5 {2 T [rol (RTL + T ) 1 n г) 


Now 


-1 киге 


шла G x RK.” 


1,-1 
, ) 


(R 
— -1 
- + 
к, (K, R) 
— -1 — 
= Ky KE, = À 


It is easily shown [18 ], and included below, that A is the 
weighting matrix yielding the least mean squared error estimate vec- 
tor 7 of y given the observation vector z. 
What is required is the matrix À such that if 
F = Az 
the mean squared error between 7 and y will be minimized. Error 


in estimating the i the component of y is 








N 
Ec - 2 ajj 2, 
Jj Í 
Hence, 
T N аз МЫ 
E Ello 2 > > aj; а 2.2, 
l o 
N N N 
=R (0)-22a,. R (ij) +5 XZ a..a. В... (,Е) 
V j=1 1 у2 ТИЕ ij ik 22 
For a minimum, 
зе} 
x j=l, ° o ө N 
ajj 
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Thus 


N 
Ryy i,j) = Е aik R zz 0.k) 


Note that R T) = š 1 
( a ч ) Вау, (т).) Іп matrix form, 


= = -1 
4 AK — Э A K, K 


Having computed A, the Likelihood Function to be determined takes 


the form, 


П). ЕС exp i (21 А2) 
- СС! ехр > (2 12) 


It is seen that the optimum receiver for detecting Gaussian signals 
forms a cross correlation of the received signal and a least-squares 
estimate of the Gaussian signal given the received signal. As in the 
case of deterministic signal detection, this cross correlation is used 
to evaluate logarithms of the conditional probabilities p(z/yl) 
i-1,... .m, which are in turn furnished to the decision making 
processor. The resulting receiver structure is shown in Fig. 4-2. 
Recalling that the covariance matrices Ky and i are of 
dimension NxN, where N is the number of samples to be processed, 
the computational difficulties in obtaining A = кке are evident. 
Using the fact that K, isa Toeplitz matrix, Trench [33 ] has 
developed an algorithm for obtaining the inverse requiring effort 
proportional only to the square of the matrix dimension, vice its 
cube. Thus the problem is reduced in magnitude, but remains quite 
a task for even moderately large sample sizes, say 200. Unfortunately, 
an even worse problem must still be faced. In the present case of 
random signals, K, isa function of к . The quantity which was 
carried as a constant C (since it would later cancel) in the Like- 
lihood Functions of deterministic signals, must now be evaluated. 


Evaluation of the determinant of К, is involved. Even with exploitation 
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Fig. 4-2 Optimum Receiver for Gaussian Signals 
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of the Toeplitz form of К,, the problem remains a considerable one. 

The signal above was assumed to consist of a sequence of 
samples from a Gaussian random process with known mean (zero) and 
known covariance function. Description of the Gaussian process in 
an equivalent but different manner wil] permit solving the problem 
while avoiding the difficulties above. As discussed in Chapter III, 
the Gaussian process may be represented to any desired degree of 
approximation as the output of a linear dynamical system excited by 
white Gaussian noise. 

Statistical description of the Gaussian message process will 
then be in terms of the mean (zero) and cowariance q of the excitation 
noise, and in terms of matrices I, $, and H defining the input- 
output relation of the linear dynamical system. As before, this 
statistical description of the message process is assumed known. 

An alternate approach to the random signal detection problem may 
be pursued with the message process description above using the 
results of Schweppe [31 ]. It involves developing a difference 
equation for the Likelihood Function (or rather its logarithm), from 
which the complete Likelihood Function, not just its exponent, may 
be determined recursively as signal samples are received. Both the 
problems of inverting the NxN matrix K, and evaluating its 
determinant are avoided. 


Let 


ub 


L(yl) p(z/yl) 
Then 


т 


К 2, 


jon N 
-2 ln L fy") = In (2 T) IK | + 2 b 


Now, 


е!) pepe) cocoa] 


-píz(ü)....z(k-1/yl ]p Uz()/7zQ) . . .z(k-1),y1] 
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Непсе, 


In Ly (yi ) -dn pi (уі) = 1n p[z(k)/z(1) .. . z(k-1), y 1] 


The conditional probability density function 
olaa) o a e e ozko) yt] 


must be considered further. Under the hypothesis that the transmitted 
signal is yi , the matrices describing its statistical characteristics 
are ri, Фі, апа НІ, A Kalman filter using these matrices and oper- 
ating on the received signal data computes values x(k/k-1) and 


P(k/k-1) which define the conditional distribution 
p(x(k)/z() . .. . .z(k-1), y* ) 


= C exp - > [ (x(k) - x(k/k-1))1 P(k/k-1)”1 (x(k)-x(k/k-1))] 


From this, it follows simply that 
p[z(k)/2) . .. . .2(k-1), vi J 


- (2 r) [HP(k/k-1)H Í+ r] E exp - [z(k) - z(k/k-1)]? 
2 (HP(k/k-1)H T^ r) 
Defining 


ы a 


2(К) = z(k) - z(k/k-1) 
yields 
-2 іп ріІг(Ю)/2().... .2(к-1),у11 


-іп2 тінР(к/к-інізі1з к) 
[HP(k/k-1)H Í+ r] 
A difference equation in -2 ln г. (у! ).Һаѕ now been determined. 
The original requirement was to evaluate LN (уі), or, equivalently, 
-2 ln Ly (yl. Hence, the differences are simply summed, k=1, 
2, . . > N. The result is 
T 


DN N - 
-2 in L, (y ) =n (2 r) ED K,z 
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N N 


= У In2T[HP(k/k-1)HT «r ]* X 2(к)2 
k=1 k=] [HP(k/k-1)HI+r] 


It can be seen that -2 In Ly (y!) has been built up from sums of scalar 
quantities, Required quantities (k) and [HP(k/k-1)H Í +r] are 
already available in the Kalman filter. The net result is that the great 
practical difficulties of inverting and evaluating the determinant of a 
large matrix have been avoided by an approach requiring no matrix in- 
version at all. The structure of the optimum receiver using this 
approach is shown in Fig. 4-3. 

On-line computations can be greatly reduced by considering use 
of the Likelihood Function above and some details of filter behavior. 
First of all, since Likelihood Functions for the various alternatives 
are to be compared, the term В In 2 7 will cancel and hence may 

k=] 
be discarded. Further, it frequently happens that the quantity 
[HP(k/k-1)HÍ + r] settles within a few time increments to its steady 
state value. In such cases, all necessary values of filter gain vec- 
tors B(k) and [ HP(k/k-1)HT + r] may be computed beforehand and 
stored with modest memory requirements. This represents quite a 
saving since the greater part of computation required by the Kalman 
filter ds» associated with updating the variance equation and computing 
gains. On-line computations are thus reduced to updating the state 
vector x(k/k) with new observations and adding in the new terms to 
the Likelihood Function evaluation. From the filter equations listed 
in Chapter II, this may be seen to amount to approximately 2n -* 2 
multiply-add combinations per received sample for each classification 


alternative, where n is the order of the model. 
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CHAPTER V 

AN EXPERIMENTAL STUDY IN RANDOM SIGNAL DETECTION 

Finally, utility of theoretical developments in engineering appli- 
cations rests upon ability of the developed techniques to perform 
successfully inithe "real world". Sensitivity of performance to dis- 
crepancies in actual and assumed parameters must be low enough to 
permit satisfactory operation in the face of such discrepancies. In 
terms of the detection/classification scheme discussed in Chapter IV, 
employing recursive evaluation of Likelihood Functions, some points 
to be investigated are: 

(i) Sensitivity to variations in the model quality as measured 
by the ratio of "correlated" to "uncorrelated" components 
of the received signal. 

(ii) Sensitivity of detection performance to discrepancies be- 
tween actual and assumed values of received signal power. 

(iii) Rapidity with which the technique delivers a clearcut 
classification decision. 

In this chapter, the above points are discussed in the context of an 
example problem using actual signals. 

I, DESCRIPTION OF THE PROBLEM 

In the following example, the two-alternative or detection problem 
is studied. Signals used are hydrophone recordings of (a) sea noise 
plus sounds of a diesel submarine, and (b) sea noise alone. In dis- 
cussions below, (a) is referred to as "signal" and (b) as "noise". The 
resulting problem is as follows: given a segment of received signal 
record, produce a Maximum Likelihood judgement as to whether the 
received signalis submarine or noise alone. 

Signals used here were originally in analog form on magnetic 
tape, and were digitized for use in computation using programs de- 
scribed in connection with the example in Chapter III. The basic 
sampling interval was 200 uns. However, as initial analysis disclosed 


little signal or noise energy above 800 cycles or so, only every third 
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sample was employed in computations, extending the sampling interval 
to 600 us. The total effective number of samples per block was thus 
reduced to 1333. Figs. 5-1 and 5-2 display plots of 180 samples of 
signal and noise, respectively. Fig. 5-3 contains a comparative plot 
of sample autocorrelation functions of the signal and noise. 

For study purposes, ten 4000-sample blocks each of signal and 
noise were digitized. The first five blocks of each group were used 
for obtaining models. Testing of the detection procedure was then 
done using the remaining five blocks in each group. 

II, MODEL DETERMINATION 

For ease in determination, the autoregressive form of model was 
selected for the signal and noise sources.. Order of the models was 
chosen to be six, since the signal was known a priori to have three 
prominent "lines" at 240, 390 and 625 cps. For less or no a priori 
knowledge about the signal structure, a procedure might be to begin 
with low order models and successively increase the order until the 
comparative power spectral density plots indicate satisfactory rep- 
resentation. For the present problem, recursive estimation of model 
coefficients as in the example of Chapter III was performed. Resulting 
models were found to be quite consistent from block to block for the 
Signal. Noise models had consistent dominant poles, but somewhat 
scattered secondary poles. Figure 5-4 shows the z- plane pole lo- 
cations for signal models from the first five blocks of signal data. 
Figure 5-5 shows the same information for noise models. 

Selection of a specific model for use in detection might be made 
by, say, averaging the corresponding coefficients of the five sample- 
derived models for each case. A simpler but less precise method 
would be to just select a representative model from each group. The 
latter method was used for the experiments here. Figures 5-6 and 
5-9 contain z-plane pole plots for the selected signal and noise models 
respectively. Figures 5-7 and 5-8 display comparative plots of auto- 


correlation functions and power spectral densities derived from the 
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Fig. 5-1 Sea Noise Plus Diesel Submarine ("Signal") | 
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Fig. 5-2 Sea Noise ("Noise") 
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.. Fig. 5-3 Autocorrelation Function of "Signal" and "Noise" 
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Fig. 5-4 2Z-Plane Poles of Autoregressive Signal Models 
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Fig. 5-5 Z-Plane Poles of Autoregressive Noise Models 
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Fig. 5-6 Poles of Representative Signal Model 
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Fig. 5-7 Autocorrelation Functions from Signal Data 
and Signal Model 
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Fig. 5-8 Power Spectral Densities from Signal Data and 
| Signal Model 
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Fig. 5-9 Poles of Representative Noise Model 
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Fig. 5-10 Autocorrelation Functions from Noise Data and 
Noise Model l 
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Fig. 5-11 Power Spectral Densities from Noise Data 
and Noise Model 
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data sample and model respectively. Figures 5-10 and 5-11 convey 
the same information about the noise model. 
III, STUDY OF PERFORMANCE 

For an honest evaluation of discrimination power of the technique 
being tested, it is important that signal classification be made solely 
on the basis of differences in correlation structure of the alternate 
posbibilities and not on energy levels. For this reason, each batch of 
4000 samples from which sub-blocks of 100 samples were processed by 
the detector was normalized to have a mean squared sample value of 
unity. 

In terms of the discrete signal process model pictured in Fig. 262, 


this means that Е[22] =1. But since 
Е[22 ]= Е[у2]+ г, 


it remains to be determined what is the most likely ratio of E[y2] to 
r. The ratio is a measure of the degree to which the model obtained 
represents the actual observed signal. 

Using a value of .9 for this ratio, the Likelihood Function for 
each alternative is computed as a function of the number of samples 
processed and is plotted in Fig. 5-12. Since the quantity -2 In 
Likelihood Function is actually computed and displayed, higher like- 
lihoods are represented by lower ordinates on the graph. In Fig. 5-12 
samples processed were actually "signal". Hence correct classification 
is indicated by the graph. 

Program QTEST then evaluates the Likelihood Function for each 
classification alternative for values of this ratio from .1 to .9. 
Figures 5-13 to 5-16 display the resulting values of the Likelihood 
Function versus the number of data samples processed. Note that 
correct classification was achieved for all values of the ratio between 
.land.9. It might also be noted that the ratio value of .9 gave the 
greatest margin of correct classification. Correct classification was 
also consistently obtained in the same manner for all blocks of in- 


dependent data (not utilized in model determination). 
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Fig. S-12 Likelihood Functions for Signal Data Coma 
Using Signal and Noise Models 
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Fig . S-13 Likelihood Functions of Signa1 Using Signal Mode1 
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Fig. 5-14 Likelihood Functions of Signal Using Noise Model 
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Fig. 5-15 Likelihood Functions of Noise Using Signal Model 
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Fig. 5-16 Likelihood Functions 


The normalization process described above is roughly equivalent, 
in practical terms, to automatic gain control (AGC). It is important 
that the detection scheme continue to function efficiently under de- 
viations between actual and assumed amplitudes (and corresponding 
mean square values). Hence a test was undertaken to check classi- 
fication efficiency against signals with actual mean square values 
ranging from one fourth to four times the assumed value. Again, con- 
sistent correct classification was maintained. Figures 5-17 to 5-24. 
contain the resulting Likelihood Function plots for actual mean square 
values equal to one fourth and four times the assumed value, in all 


cases equal to unity. 
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Fig. 5-17 Likelihood Functions of Signal Using Signal Model - 
Mean Square Signal Received Equal One-fourth 
Assumed Value 
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Fig. S-18 Likelihood Function of Signal Using Noise Model - 
Mean Square Signal Received Equal One-fourth - 
Assumed Value | 
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Fig. 5-19 Likelihood Function of Noise Using Signal Model - 
Mean Square Signal Received Equal One-fourth 
Assumed Value 
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Fig. 5-20  Likelihood Function of Noise Using Noise Model - 
Mean Square Signal Received Equal One-fourth 
Assumed Value . 
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Fig. 5-21  Likelihood Function of Signal Using Signal Model - 
Mean Square Signal Received Equal Four Times 
Assumed Value 
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 Fig..5-22  Likelihood Function of Signal Using Noise Model - 
Mean Square Signal Received Equal Four Times 
Assumed Value | 


K-SCALE = 2.00E+01 UNITS/INCH. 
Y-SCALE = 1.00E+02 UNITS”INCH. 





Fig. 9-23 Likelihood Function of Noise Using Signal Model - 
Mean Square Signal Received Equal Four Times 
Assumed Value 
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K-SCALE = 2.00E401 UNITS/INCH. 
Y-SCRLE = 1.00400 UNITS/INCH. 





Fig. 9-24 Likelihood Function of Noise Using Noise Model - 
f Mean Square Signal Received Equal Four Times 
Assumed Value 
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CHAPTER VI 
RECOMMENDATIONS FOR FURTHER WORK 

A frequent happening in applied research is that there are as many 
new questions raised as old questions resolved. Such has been the 
case in this instance. 

Perhaps the problem most ripe for attack is that of estimating num- 
erator coefficients of the linear filter model for random time series. 
This is equivalent to estimation of the moving average coefficients in 
a mixed auto-regressive-moving average model. A promising арргоаов 
might lie in numerical processing of the residuals after recursive de- 
termination of the autoregressive coefficients. 

Further study of applications of the methods used here to practical 
problems in signal detection/classification would also be interesting 
and potentially valuable. In particular, multiple-alternative signal 
classification should be interesting to consider. Performance in such 
problems should be improved by better models resulting from successful 
determination of linear filter model numerator coefficients. 

Reliability of detection/classification as a function of record 
length processed and as a function of some measure of difference be- 
tween power spectral densities of alternate classifications should be 
valuable to investigate. For the detection problem, differences between 
power spectral densities of signal plus noise and noise alone relate to 
signal-to-noise ratio in the conventional problem formulation. 

Classification methods used herein also hold potential contri- 
butions ih character recognition. Required for application of the 
methods here are temporally vice spatially distifbuted patterns. How- 
ever, the scanning process used in several present approaches to 
character recognition already produces the necessary spatial-temporal 
transformation. The advantage to be gained should be an insensitivity 
to character translation, a problem which plagues many present 
approaches. 

Finally, the problem of model determination for vector-valued 


random time series remains. Some work on the problem has been done, 
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and is described in [3 ], [30 ]. A comparative study of work referenced 
and the general linear filter model should hold some promise of bene- 
fiting the general analysis of vector-valued random time series both 


in engineering and other disciplines. 
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APPENDIX I 
OPTIMAL ESTIMATION OF AN ARBITRARY LINEAR COMBINATION 
OF SIGNAL GENERATING PROCESS STATE VARIABLES 

Assume the usual model of the signal generating process with 
state variables x(k) and noisy observations z(k) = Hx(k) + u(k). It 
is desired to estimate an arbitrary linear combination of the state 
variables s = Dx based on the noisy observations z. The optimal 
linear estimate (in the Bayes sense, with a quadratic loss function) 
can be shown [17 ], [35 ] to be the expected value of the conditional 


distribution 
рК) (k),-z(k-1), .- ИІ 


Since the process is linear, and since the excitation and noise 
sources are gaussian, the conditional distributions of both x and s 
given measurements Z are gaussian. The problem at hand will be 
solved if the conditional distribution of s given Z can be obtained, 
from which the mean may be extracted. 

Consider first the conditional distribution of x given the 


measurements z., By Bayes rule, 


p[x(k)/z(k), z(k-1), . . . z(1)] 
= р[ (к) /х(к) ,2(к-1), . . 2(1) № [х(к)/2(к-1), . . 201 
plz(k)/z(k-1), . . z(1)] 


As pointed out, the form of the left hand side is known, and is: 


pLx(k)/z(k),z(k-1), . . . zQ)] 
= C exp -š [ (x(k) - x(k/k)) Í P(k/k) 1 (x(k) - x(k/k))] 


On the right hand side, 
p[z(k)/x(k),z(k-1), . . .z(1)]= p[z(k)/x (k) ] 
= C exp -$[(z(k) - Hx(k))? Ro! (zk) - Нх(к))1 
р[х(К)/>(К-1),...>(1)] 
- C exp -&L(x(k) - óx(k-1/k-1)) T P(k/k-1) "1 
(x(k) - &x(k-1/k-1))] 
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pizzas... ei 
= С ехр -4[(2(к) - Н фх(к-1/к-1))1 (нр(к/к-1)нТ + g^! 
(z(k) - H #x(k-1/k-1)) ] 


Defining parameters of the left hand side distribution may be obtained 
from the right hand side by matching terms [26 ]. The Kalman filter 
equations are obtained, which define the left hand side distribution 


parameters recursively, and one has 
x(k)/z(k),z(k-1), . . . . zQ) ^ Nix(k/k), P(k/k)] 


in terms of known quantities. 


In Anderson [1 ], it is proved that 
х ~ Мы, D5] = s=Dx ~N(Dyp, DED Í] 


The theorem includes the cases where x may have either a non- 
Singular or singular distribution and D may be nonsingular or of 
rank less than the dimension of s. 

Invoking the theorem, the optimal estimate of s is seen to be 
Dx(k/k). Hence the optimal estimate of a linear combination of the 
states is the same linear combination of the optimal estimate of the 


states. 
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APPENDIX II 
THE KALMAN FILTER AND ESTIMATION OF A GAUSSIAN MEAN 

The following discussion outlines an interpretation of the Kalman 
filter (discrete case) in which it is considered to be a generalized es- 
timation of a Gaussian population mean. The term "generalized" is 
used to mean that, contrary to usual assumptions in estimation of 
the mean, the population to be estimated here will be permitted to 
move between sampling instants. Movement will be assumed to 
consist of both a prescribed component and a random component. 

The analogy to be described permits achieving an intuitive feel 
for rates of convergence, effects of uncertainty in itlitial conditions, 
and effects of random signal process excitation. Such an intuitive 
feel for filter behavior is frequently difficult to obtain from conven- 
tional formulation of the estimation problem. 

The univariate case will be considered first, for simplicity. The 
same analogy, however, carries over to the multivariate case, and 


will be considered in Brief detail later. 


'No Dynamics" Situation 


To begin with, the classical mean estimation procedure will be 
written in recursive form. Consider a scalar random variable 2 


where 


М(Х т) x = unknown population mean 
r = known population variance 
If Z isthe sample mean of a sample of size n from the population, 
then 


Z N G: iD) where p = £ 
n 


Since the sample mean is the best (minimum variance) estimate of 
the mean of a Gaussian population [17], 2 = Ze 
Let z(1), z(2), . . . . z(k) denote sample values of the random 


variable Z. 
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Then 


77 
N 
E: 

I 


X(k-1) ë [z(k) - 2(k-1)] 


p(k) = РШ. = 0 - 2 )р(к-1) МЕ 3 o. 


K(k) = R(k-1) + b(k) [z(k) - X(k-1)] 
p(k) = [1 - b(k)]p(k-1) та ЖТА ғ 
It will be useful to note that as each new sample value is con- 
sidered, the weightings given it and the previous best estimate are 


in inverse proportion to their respective variances. For example, at 


the k th stage above, 


$ (k) = k-12(k-1) + ل‎ z (k) 
k k 





Var X(k-1) = ` 














Var z(k) = r 
Vs MM E T 
k k-1 
r 4-I 
k-1 
Weighting given to sample value = = = 
r k 
I 
Weighting given to previous best estimate = k = k-1 
5 r k 
k-1 


An alternate view is to note that weightings given each sample 
value and the previous best estimate are:proportional to their relative 
information content, the reciprocal of their relative uncertainties. 
From this point of view, the information content of a current best 
estimate (the reciprocal of its variance) is the sum of the information 
contents of the sample and previous estimate. 

Next, the behavior of a Kalman filter of the following description 


will be compared with that of the sequential estimation procedure above. 
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The restrictions that the transition matrix (a scalar here) be unity 


and that there be no signal process excitation will be relaxed later. 


x = scalar H=h=1 
Ф- ф= 1 R=r 
Oa = D0 pP 


The four equations describing filter behavior are repeated below for 


reference as desired. 


P{k/k-1) = @P(k-1/k-1) « 1+ Q 

B(k) = P(k/k-1)H ? [HP(k/k-1)H T+ R)7} 

x(k/k) = &x(k-1/k-1) + B(k)[z(k) - «х(к-1/к-1)1 
P(k/k) = (I - B(k)H] P(k/k-1) 


Three cases will be discussed, outlining filter behavior under three 
different levels of uncertainty in itiaial estimates (a priori input to 
the filter). Single subscripts will be used in the present section 
Since there is no ambiguity under the conditions assumed, i.e. 
DETRE pR L). 

Case 1, Great uncertainty in initial estimate of x. Let 


p(0) =ar where a = large number 











= T = ааа а = 
b(1) = ph(hph~ +r) an m 1 
р(1) = (1 - b(1)h)p(0) GOS ar 
a+] a+] 
r — ¬ 
b(2) = = z ur b (k) 
p(2) = (1 - #): = 2 | sie) a 


Case 2. Uncertainty in initial estimate of x approximately 


equal to uncertainty in the observation process, i.e. p(0) =r. 











zl m = 1 
к amd Pues d 
p1)-(1-b()]p(0 2—— р(к) =— 

2 К+1 
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Case 3, Uncertainty in initial estimate of x less than uncer- 


tainty in observation process. 
































Let p(0) = where £ iS Some positive integer. 
r er T 1 - 
bu) 5 ( 4 ud £ r(1+/ 1+4 
= (que iy (ig eel ge ےک‎ 2,7 
p(1) = (1 - b(1))p(0) = (1 TW ) 1 үтү ү DU 
b(2) = Lii i + r) -] — 1 1+4 — h 
1+4 1+4 1+4 2+4 2+4 
a2) u BS r =) І 
2+4 1+4 2+4 
b (k) = 1 
k+4 
pk) = — 
k+ 4 


In the Kalman filter for the particular process just described, 
effects of varying degrees of uncertainty in the initial estimate are 
isolated and clearly observed. It can be seen that for great uncer- 
tainty in the initial estimate, the filter behaves asymptotically as the 
sequential estimation procedure outlined, i.e. p(k) and b(k) 

(k =2, 3, .. . Nn) decrease from r and 1 respectively as 


For uncertainty in the initial estimate equal to or less than uncertainty 
imposed by the observation process such that p(0) = r (r= 
positive integer), then the filter data weighting and convergence rate 
are the same as the data weighting and convergence rate of the 
sampling estimation scheme from its (#1)th sample. In other words, 
a priori information may be regarded as providing equivalent additional 
observations of the process to be estimated. 

It is thus clearly shown that under the conditions stated, p and 


b decrease asymptotically as E Interpolating between the assumed 











rational values of the ratio — yields a gain schedule b(k) = 1 
р(0) k+a 
(a = (0) ) for the filter in the simple situation being considered. 
p 
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Unexcited Dynamic Situation 

Next, a more general univariate situation will be considered, In 
the Kalman filter representation, the restriction above that ¢ = 1 
will be relaxed. The corresponding situation in the sequential estimation 
procedure will be that during the sampling process, the population mean 
moves about in a prescribed fashion, i.e. x(k+l) = ¢x(k). As pointed 
out above, the sequential estimation procedure forms a weighted sum 
of the previous best estimate and the new data (sample value), and 
specifies a new estimate variance equal to the "parallel combination" 
of the sample and previous optimal estimate variances. 

The essential difference between the present case with relaxed 
restrictions on ¢ and the case previously discussed is simply that 
just as the population mean is moved between samples, the previous 
best estimate must be adjusted by ø prior to adding in effects of the 
new sample value. 

A double-subscript notation will be helpful here, Let x(k/k-1) 2 
The optimal estimate of x at sample time k given samples up to 
sample time k-l. 
Since x(k) = @x(k-1) , X(k/k-1) = ¢&(k-1/k-1) 
and since X(k-1/k-1) ~N(x(k-1), p(k-1/k-1)), 

X(k/k-1) ~N(x(k), 6p (k-1/k-1)). 


Hence, 
p(k/k) = 1 22 p(k-1/k-1) E rp(k/k-1) 
r * $* p(k-1/k-1) r+ p(k/k-1) 
= b(k)r 


The resulting gains with which X (k/k-1) and z(k) are added are 
p (k/k) p(k/k) 
mü) Sd го 

No additional discussion of the Kalman filter is required for this 
situation, where ¢ is not constrained to be unity. lf the filter a 
priori variance p(0) equals p(k/k-1) of the sequential estimation 


procedure for any k, the filter will behave from the beginning just 
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as the estimation scheme does from the k th stage on. 
Randomly Excited Dynamics | 

The situation may be made still more general by relaxing the 
restriction in A that q = 0. In the sequential estimation procedure, 
the interpretation is that not only will the population mean move in a 
prescribed fashion between sampling instants, but will have an 
additional random motion imparted to it by a gaussian excitation in- 
dependent from sample to sample. 

Again, no additional discussion of the Kalman filter is required. 
Further, treatment of the:present situation in the sequential estimation 
procedure is a direct extension from B. 

The first few steps are 

р(1) =r 
р(1) = 1 
p(2/1) =ó” r + q 


p(2/2) = (ф°г+д)г 


(p^ + 1)r +q 
b(2) = D22) = 22 r + q 
r (@2 + 1)r + q 


ес. 
The steps in general are 
p(k/k-1) = 6° p(k-1/k-1) + q 


p(k/k) = p(k/k-1)r 
p(k/k-1) + r 


b(k) = p(k/k-1) 


p(k/k-1) + r 
The general updating steps agree with those of the Kalman filter, 
Again, if the filter a priori variance p(0) equals p(k/k-1) of the 
sequential estimation procedure for some k, the same remark as 
in B is valid. For the present situation, p does not continue in- 
definitely to decrease, but reaches a non-zero steady state value. 
This steady state value of p and the corresponding steady state 


gain may be computed by setting p(k*1/k-41) = p(k/k) and solving. 
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The Multivariate Case 

Discussion of the multivariate case will be limited to situations 
where H =I. Though this excludes cases of major interest where 
the Kalman filter has been employed, the additional detail required 
would obscure the comparison sought here. 

Under the frequent assumptions of diagonal P(0) and R matrices, 
the case where Ф-І апа О = 0 (по dynamics) really amounts to n 
uncoupled problems of the type covered above, and is of little 
additional interest, Ho has shown [15 ] that P decreases asymp- 
totically as Е for this case without the assumptiin of a diagonal 
P(0). 

The randomly excited dynamic situation is of more general 
interest and is a direct extension of the approach in the univariate 
case. As before, if the filter a priori covariance matrix P(0) = 
P(k/k-1) of the sequential estimation procedure for any k, its 
effect may be regarded as supplying k equivalent additional ob- 
servations to the filter. Adjustment of the optimal estimate covariance 
between samples is: 

P(k/k-1) = &P(k-1/k-1) € '« О. 
Updating of the covariance matrix and computation of the new gain 


is also a direct extension: 


Univariate case: Multivariate case: 
p(k/k) = p(k/k-1)r P(k/k) 7 P(k/k-1)(P(k/k-1) + ЕВ) *Ё 
p(k/k-1) +r 
b(k) = p(k/k) B(k) = P(k/k)R7+ 


r 
A check with the filter equations above will disclose that this is 


exactly the same updating sequence as in the Kalman filter. 
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